Period Disambiguation with Maxent Model
نویسندگان
چکیده
This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number of experiments are conducted on PTB-II WSJ corpus for the investigation of how context window, feature space and lexical information such as abbreviated and sentence-initial words affect the learning performance. Such lexical information can be automatically acquired from a training corpus by a learner. Our experimental results show that extending the feature space to integrate these two kinds of lexical information can eliminate 93.52% of the remaining errors from the baseline Maxent model, achieving an F-score of 99.8227%.
منابع مشابه
Reduction of Maximum Entropy Models to Hidden Markov Models
Maximum Entropy (maxent) models are an attractive formalism for statistical models of many types and have been used for a number of purposes, including language modeling (Rosenfeld 1994), part of speech tagging (Ratnaparkhi 1996), prepositional phrase attachment (Ratnaparkhi 1998), sentence breaking (Reynar and Ratnaparkhi 1997) and parsing (Ratnaparkhi 1997). Maxent models allow the combinatio...
متن کاملA Maximum Entropy Approach To Disambiguating VerbNet Classes
This paper focuses on verb sense disambiguation cast as inferring the VerbNet class to which a verb belongs. To train three different supervised learning models –Maximum Entropy (MaxEnt), Naive Bayes and Decision Tree– we used lexical, co-occurrence and typed-dependency features. For each model, we built three classifiers: one single classifier for all verbs, one single classifier for polysemou...
متن کاملMELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features
This paper describes a maxent-based preposition sense disambiguation system entry to the preposition sense disambiguation task of the SemEval 2007. This system uses a wide variety of semantic and syntactic features to perform the disambiguation task and achieves a precision of 69.3% over the test data.
متن کاملWhat is it? Disambiguating the different readings of the pronoun 'it'
In this paper, we address the problem of predicting one of three functions for the English pronoun ‘it’: anaphoric, event reference or pleonastic. This disambiguation is valuable in the context of machine translation and coreference resolution. We present experiments using a MAXENT classifier trained on gold-standard data and self-training experiments of an RNN trained on silver-standard data, ...
متن کاملExperiments on Sense Annotations and Sense Disambiguation of Discourse Connectives
Discourse connectives can be analyzed as discourse level predicates which project predicate-argument structure on a par with verbs at the sentence level. The Penn Discourse Treebank (PDTB) reflects this view in its design providing annotation of the discourse connectives and their arguments. Like verbs, discourse connectives have multiple senses. We present a set of manual sense annotation stud...
متن کامل